One-Way ANOVA
Kruskal-Wallis

July 10, 2025
Thursday

Introduction: Topics

  • Previous: two groups, continuous outcome

  • Now: more than two groups, continuous outcome

  • One-way ANOVA

    • posthoc testing
    • assumptions
  • Kruskal-Wallis

    • posthoc testing

Introduction: Analysis of Variance

  • We have previously discussed testing the difference between two groups.

    • What about when there are three or more groups?
  • We will use a method called analysis of variance (ANOVA).

    • This method partitions the variance of the outcome into variance due to the groups and variance due to “other” factors.
  • Fun fact: the two-sample t-test is a special case of ANOVA.

    • If you perform ANOVA when comparing two means, you will obtain the same results as the two-sample t-test.

Introduction: Analysis of Variance

  • Fun fact: the two-sample t-test is a special case of ANOVA.

  • Two-sample t-test:

Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 0
Alternative: H₁: μ₁ − μ₂ ≠ 0
Test statistic: t(98) = 0.52
p-value: p = 0.604
Conclusion: Fail to reject the null hypothesis (p = 0.6041 ≥ α = 0.05)
  • In ANOVA,
One-Way ANOVA: 
H₀: μ_A = μ_B
H₁: At least one group mean is different
Test Statistic: F(1, 98) = 0.271
p-value: p = 0.604
Conclusion: Fail to reject the null hypothesis (p = 0.6041 ≥ α = 0.05)

One-Way ANOVA

  • In one-way ANOVA, we are partitioning the variability in our outcome (SSTotal) into two pieces:
    • Variability due to the group (SSTreatment),
    • Variablity due to “other factors” (SSError).
      • Think of this like a “catch all” for other sources of error – things we did not adjust for in our model.

One-Way ANOVA: ANOVA Table

  • The computations for ANOVA are more involved than what we’ve seen before.

  • An ANOVA table will be constructed in order to perform the hypothesis test.

Source Sum of Squares df Mean Squares F
Treatment SSTrt dfTrt MSTrt F0
Error SSE dfE MSE
Total SSTot dfTot
  • Once this is put together, we can perform the hypothesis test.

    • Our test statistic is the F_0.

One-Way ANOVA: (Hand) Computations

  • Before we begin our computations, it would be helpful if we know

\bar{x}, \ \ n_i, \ \ \bar{x}_i, \ \ s_i^2

  • where,
    • \bar{x} is the overall mean,
    • n_i is the sample size for group i,
    • \bar{x}_i is the mean for group i, and
    • s_i^2 is the variance for group i

One-Way ANOVA: (Hand) Computations

  • We begin our computations with the sums of squares:

\begin{align*} \text{SS}_{\text{Trt}} &= \sum_{i=1}^k n_i(\bar{x}_i-\bar{x})^2 \\ \text{SS}_{\text{E}} &= \sum_{i=1}^k (n_i-1)s_i^2 \\ \text{SS}_{\text{Tot}} &= \text{SS}_{\text{Trt}} + \text{SS}_{\text{E}} \end{align*}

One-Way ANOVA: (Hand) Computations

  • Each sum of squares has degrees of freedom:

\begin{align*} \text{df}_{\text{Trt}} &= k-1\\ \text{df}_{\text{E}} &= n-k\\ \text{df}_{\text{Tot}} &= n-1 \end{align*}

One-Way ANOVA: (Hand) Computations

  • Once we have the sum of squares and corresponding degrees of freedom, we can compute the mean squares.

  • In the case of one-way ANOVA, \begin{align*} \text{MS}_{\text{Trt}} &= \frac{\text{SS}_{\text{Trt}}}{\text{df}_{\text{Trt}}} \\ \text{MS}_{\text{E}} &= \frac{\text{SS}_{\text{E}}}{\text{df}_{\text{E}}} \end{align*}

    • Note that there is no \text{MS}_{\text{Tot}}!

One-Way ANOVA: (Hand) Computations

  • A note about mean squares: they are almost always constructed the same way.

\text{MS}_X = \frac{\text{SS}_X}{\text{SS}_E}

  • This is important to know for future statistics courses that may have you calculating things by hand.

One-Way ANOVA: (Hand) Computations

  • Finally, we have the test statistic.

  • Generally, we construct an F for ANOVA by dividing the MS of interest by MS_E_,

F_X = \frac{\text{MS}_X}{\text{MS}_{\text{E}}}

  • In one-way ANOVA, we are only constructing the F for treatment,

F_0 = \frac{\text{MS}_{\text{Trt}}}{\text{MS}_{\text{E}}}

One-Way ANOVA: (Hand) Computations

  • We are finally done constructing our ANOVA table! As a reminder,
Source Sum of Squares df Mean Squares F
Treatment SSTrt dfTrt MSTrt F0
Error SSE dfE MSE
Total SSTot dfTot

One-Way ANOVA: ANOVA Table (R)

  • We will use the one_way_ANOVA_table() function from library(ssstats) to construct the ANOVA table.
dataset_name %>% one_way_ANOVA_table(continuous = continuous_variable,
                                     grouping = grouping_variable)

One-Way ANOVA: ANOVA Table

  • In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.

  • To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (magic_abiity_score) among the four pony types. We will test at the \alpha=0.05 level.

  • Let’s first create the ANOVA table. How should we update this code?

dataset_name %>% one_way_ANOVA_table(continuous = continuous_variable,
                                     grouping = grouping_variable)

One-Way ANOVA: ANOVA Table

  • In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.

  • To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (magic_abiity_score) among the four pony types.

  • Let’s first create the ANOVA table. Our updated code:

magical_studies %>% one_way_ANOVA_table(continuous = magic_ability_score,
                                        grouping = pony_type)

One-Way ANOVA: ANOVA Table

  • To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (magic_abiity_score) among the four pony types.

  • Running the code,

magical_studies %>% one_way_ANOVA_table(continuous = magic_ability_score,
                                        grouping = pony_type)
One-Way ANOVA Table
Source Sum of Squares df Mean Squares F
Treatment 503.84 3 167.95 1.53
Error 14,963.20 136 110.02
Total 15,467.04 139

Hypothesis Testing: One-Way-ANOVA

  • In one-way ANOVA, hypotheses always take the same form:

    • H_0: \ \mu_1 = \mu_2 = ... = \mu_k
    • H_1: at least one is different
  • Note 1: you must fill in the “k” when writing hypotheses!

    • e.g., if there are four means, your hypotheses are

      • H_0: \ \mu_1 = \mu_2 = \mu_3 = \mu_4
      • H_1: at least one is different
    • e.g., in our MLP example,

      • H_0: \ \mu_{\text{unicorn}} = \mu_{\text{earth}} = \mu_{\text{alicorn}} = \mu_{\text{pegasus}}
      • H_1: at least one is different

Hypothesis Testing: One-Way ANOVA

Test statistic:

F_0 = \frac{\text{MS}_{\text{Trt}}}{\text{MS}_{\text{E}}}

p-Value:

p = P[F_{k-1,n-k} \ge F_0]

Hypothesis Testing: One-Way ANOVA (R)

  • We will use the one_way_ANOVA() function from library(ssstats) to construct the ANOVA table.
dataset_name %>% one_way_ANOVA(continuous = continuous_variable,
                               grouping = grouping_variable,
                               alpha = specified_alpha)

Hypothesis Testing: One-Way ANOVA

  • In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.

  • To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (magic_abiity_score) among the four pony types. We will test at the \alpha=0.05 level.

  • Let’s now formulate the hypothesis test. How should we update this code?

dataset_name %>% one_way_ANOVA(continuous = continuous_variable,
                               grouping = grouping_variable,
                               alpha = specified_alpha)

Hypothesis Testing: One-Way ANOVA

  • In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.

  • To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (magic_abiity_score) among the four pony types. We will test at the \alpha=0.05 level.

  • Let’s now formulate the hypothesis test. Our updated code,

magical_studies %>% one_way_ANOVA(continuous = magic_ability_score,
                                  grouping = pony_type, 
                                  alpha = 0.05)

Introduction: Posthoc Testing

  • Today we have introduced ANOVA. Recall the hypotheses,

    • H_0: \mu_1 = \mu_2 = ... = \mu_k
    • H_1: at least one \mu_i is different
  • The F test does not tell us which mean is different… only that a difference exists.

  • In theory, we could perform repeated t tests to determine pairwise differences.

    • Recall that ANOVA is an extension of the t test… or that the t test is a special case of ANOVA.

    • However, this will increase the Type I error rate (\alpha).

Introduction: Posthoc Testing

  • Recall that the Type I error rate, \alpha, is the probability of incorrectly rejecting H_0.

    • i.e., we are saying there is a difference between the means when there is actually not a difference.
  • Suppose we are comparing 5 groups.

    • This is 10 pairwise comparisons!!

      • 1-2, 1-3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, 4-5
    • If we perform repeated t tests under \alpha=0.05, we are inflating the Type I error to 0.40! 😵

Introduction: Posthoc Testing

  • When performing posthoc comparisons, we can choose one of two paths:

    • Control the Type I (familywise) error rate.
    • Do not control the Type I error rate.
  • Note that controlling the Type I error rate is more conservative than when we do not control it.

    • “Conservative” = more difficult to reject.
  • Generally, statisticians:

    • do not control the Type I error rate if examining the results of pilot/preliminary studies that are exploring for general relationships.

    • do control the Type I error rate if examining the results of confirmatory studies and are attempting to confirm relationships observed in pilot/preliminary studies.

Introduction: Posthoc Testing

  • The posthoc tests we will learn:

    • Tukey’s test

      • Performs all pairwise tests and controls the Type I error rate
    • Fisher’s least significant difference

      • Performs all pairwise tests but does not control the Type I error rate
    • Dunnett’s test

      • Compares each group to a control group and controls the Type I error rate
  • Caution: we should only perform posthoc tests if we have determined that a general difference exists!

    • i.e., we rejected when looking at the F test in ANOVA

Example

  • Recall the dental example from earlier,
m1 <- aov(strength ~ system, data = data)
Error in eval(predvars, data, env): object 'strength' not found
summary(m1)
Error: object 'm1' not found
  • Are we justified in posthoc testing? (Recall: \alpha=0.01).

Tukey’s Test

  • Tukey’s test allows us to do all pairwise comparisons while controlling \alpha.

  • The underlying idea of the comparison:

    • We declare \mu_i \ne \mu_j if |\bar{y}_i - \bar{y}_j| \ge W, where W = \frac{q_{\alpha}(k, \text{df}_{\text{E}})}{\sqrt{2}} \sqrt{\text{MSE} \left( \frac{1}{n_i} + \frac{1}{n_j} \right)}

      • q_{\alpha}(k, \text{df}_{\text{E}}) is the critical value from the Studentized range distribution.
  • We will use the TukeyHSD() function.

    • Note that this requires us to have created our model using the aov() function.
m <- aov(continuous_variable ~ grouping_variable, data = dataset_name)
TukeyHSD(m)$grouping_variable

Tukey’s Test

  • Let’s apply Tukey’s to the dental data.
m <- aov(strength ~ system, data = data)
Error in eval(predvars, data, env): object 'strength' not found
TukeyHSD(m, conf.level = 0.99)$system
Error: object 'm' not found
  • Which are significantly different at the \alpha=0.01 level?

Fisher’s Test

  • Fisher’s allows us to test all pairwise comparisons but control the \alpha.

  • The underlying idea of the comparison:

    • We declare \mu_i \ne \mu_j if |\bar{y}_i - \bar{y}_j| \ge \text{LSD}, where \text{LSD} = t_{1-\alpha/2, \text{df}_\text{E}} \sqrt{\text{MSE} \left( \frac{1}{n_i} + \frac{1}{n_j} \right)}
  • We will use the LSD.test() function from the agricolae package.

    • Note that, like Tukey’s, this requires us to have created our model using the aov() function.
library(agricolae)
results <- summary(m)
(LSD.test(dataset_name$continuous_variable, # continuous outcome
          dataset_name$grouping_variable, # grouping variable
          results[[1]]$Df[2], # df_E
          results[[1]]$`Mean Sq`[2], # MSE
          alpha = alpha_level) # can omit if alpha = 0.05
  )[5] # limit to only the pairwise comparison results

Fisher’s Test

  • Let’s apply Fisher’s to the dental data.
library(agricolae)
results <- summary(m)
Error: object 'm' not found
LSD.test(data$strength, 
         data$system, 
         results[[1]]$Df[2], 
         results[[1]]$`Mean Sq`[2],
         alpha = 0.01)[5]
Error in `[.data.frame`(junto, , 1): undefined columns selected
  • Which are significantly different at the \alpha=0.01 level?

Dunnett’s Test

  • Dunnett’s test allows us to do all pairwise comparisons against only the control, while controlling \alpha.

    • This has fewer comparisons than Tukey’s because we are not comparing non-control groups to one another.

    • i.e., we are sharing the \alpha between fewer comparisons now, which is preferred if we are not interested in the comparisons between non-control groups.

  • The underlying idea of the comparison:

    • We declare \mu_i \ne \mu_j if |\bar{y}_i - \bar{y}_j| \ge D, where D = d_{\alpha}(k-1, \text{df}_{\text{E}}) \sqrt{\text{MSE} \left( \frac{1}{n_i} + \frac{1}{n_c} \right)},

      • d_{\alpha}(k-1, \text{df}_{\text{E}}) is the critical value from Dunnett’s table.

Dunnett’s Test

  • We will use the DunnettTest() function from the DescTools package to perform Dunnett’s test.
library(DescTools)
DunnettTest(x=dataset_name$continuous_variable, 
            g=dataset_name$grouping_variable, 
            control = "name of control group")
  • The p-values are adjusted, so you can directly compare them to the specified \alpha.

Dunnett’s Test

  • Let’s apply Dunnett’s to the dental data.

    • We will treat “Ceramic” as the control group.
library(DescTools)
DunnettTest(x=data$strength, 
            g=data$system, 
            control = "Ceramic")
Error in complete.cases(x, g): no input has determined the number of cases
  • Which are significantly different at the \alpha=0.01 level?

Introduction: ANOVA Assumptions

  • We previously discussed testing three or more means using ANOVA.

  • We also discussed that ANOVA is an extension of the two-sample t-test.

  • Recall that the t-test has two assumptions:

    • Equal variance between groups.

    • Normal distribution.

  • We will extend our knowledge of checking assumptions today.

ANOVA Assumptions: Definition

  • We can represent ANOVA with the following model:

y_{ij} = \mu + \tau_i + \varepsilon_{ij}

  • where:

    • y_{ij} is the j^{\text{th}} observation in the i^{\text{th}} group,
    • \mu is the overall (grand) mean,
    • \tau_i is the treatment effect for group i, and
    • \varepsilon_{ij} is the error term for the j^{\text{th}} observation in the i^{\text{th}} group.

ANOVA Assumptions: Definition

  • We assume that the error term follows a normal distribution with mean 0 and a constant variance, \sigma^2. i.e., \varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)

  • Very important note: the assumption is on the error term and NOT on the outcome!

  • We will use the residual (the difference between the observed value and the predicted value) to assess assumptions: e_{ij} = y_{ij} - \hat{y}_{ij}

ANOVA Assumptions: Graphical Assessment

  • Normality: quantile-quantile plot

    • Should have points close to the 45^\circ line
    • We will focus on the “center” portion of the plot
  • Variance: scatterplot of the residuals against the predicted values

    • Should be “equal spread” between the groups
    • No “pattern”

ANOVA Assumptions: Graphical Assessment

  • Like with t-tests, we will assess these assumptions graphically.

  • We will return to the classpackage package and use the anova_check() function.

library(classpackage) 
anova_check(m)

ANOVA Assumptions: Graphical Assessment

  • Recall the dental example from last week,
library(tidyverse)
strength <- c(15.4, 12.9, 17.2, 16.6, 19.3,
              17.2, 14.3, 17.6, 21.6, 17.5,
               5.5,  7.7, 12.2, 11.4, 16.4,
              11.0, 12.4, 13.5,  8.9,  8.1)
system <- c(rep("Cojet",5), rep("Silistor",5), rep("Cimara",5), rep("Ceramic",5))
data <- tibble(system, strength)
m <- aov(strength ~ system, data = data)
summary(m)
            Df Sum Sq Mean Sq F value  Pr(>F)   
system       3  200.0   66.66   7.545 0.00229 **
Residuals   16  141.4    8.84                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA Assumptions: Assessing Graphically

  • Let’s assess the assumptions,
library(classpackage)
Error in library(classpackage): there is no package called 'classpackage'
anova_check(m)

ANOVA Assumptions: Test for Variance

  • We can formally check the variance assumption with the Brown-Forsythe-Levine test.

    • This test transforms the data and then performs ANOVA!
  • The test statistic is calculated as follows, F_0 = \frac{\sum_{i=1}^k n_i (\bar{z}_i - \bar{z})^2/(k-1)}{\sum_{i=1}^k \sum_{j=1}^{n_j}(z_{ij}-\bar{z}_i)^2/(n-k) }, where

    • k is the number of groups,
    • n_i is the sample size of group i,
    • n = \sum_{i=1}^k n_i, and
    • z_{ij} = |y_{ij} - \text{median}(y_i)|

ANOVA Assumptions: Test for Variance

  • Hypotheses

    • H_0: \ \sigma^2_1 = ... = \sigma^2_k
    • H_1: at least one \sigma^2_i is different
  • Test Statistic

    • F_0 (take from resulting ANOVA table)
  • p-Value

    • p = P[F_{\text{df}_{\text{Trt}}, \text{df}_{\text{E}}} \ge F_0]
  • Rejection Region

    • Reject if p < \alpha.

ANOVA Assumptions: Test for Variance

  • We will use the leveneTest() function from the car package.

    • Note: I do not load the car package because it overwrites a necessary function in tidyverse.
car::leveneTest(model_results)
  • In our dental example,
car::leveneTest(m)

ANOVA Assumptions: Test for Variance

  • Hypotheses

    • H_0: \ \sigma^2_1 = \sigma^2_2 = \sigma^2_3 = \sigma^2_4
    • H_1: at least one \sigma^2_i is different
  • Test Statistic and p-Value

    • F_0 = 0.734
    • p = 0.547
  • Rejection Region

    • Reject if p < \alpha; \alpha=0.01.
  • Conclusion/Interpretation

    • Fail to reject H_0. There is not sufficient evidence to suggest that the variances are different (i.e., the variance assumption is not broken).

Introduction: Kruskal-Wallis

  • We just discussed the ANOVA assumptions.

\varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)

  • We also discussed how to assess the assumptions:

    • Graphically using the anova_check() function.

    • Confirming the variance assumption using the BFL.

  • If we break an assumption, we will turn to the nonparametric alternative, the Kruskal-Wallis.

Kruskal-Wallis Test

  • If we break ANOVA assumptions, we should implement the nonparametric version, the Kruskal-Wallis.

  • The Kruskal-Wallis test determines if k independent samples come from populations with the same distribution.

  • Our new hypotheses are

    • H_0: M_1 = ... = M_k
    • H_1: at least one M_i is different

Kruskal-Wallis Test

  • The test statistic is as follows:

\chi^2_0 = \frac{12}{n(n+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(n+1),

  • where

    • R_i is the sum of the ranks for group i,
    • n_i is the sample size for group i,
    • n = \sum_{i=1}^k n_i = total sample size, and
    • k is the number of groups.
  • H follows a \chi^2 distribution with k-1 degrees of freedom.

Kruskal-Wallis Test

  • Hypotheses

    • H_0: \ M_1 = ... = M_k
    • H_1: at least one M_i is different
  • Test Statistic

    • \chi^2_0 = \frac{12}{n(n+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(n+1)
  • p-Value

    • p = P[\chi^2_{k-1} \ge \chi^2_0]
  • Rejection Region

    • Reject H_0 if p < \alpha

Kruskal-Wallis Test

  • We will use the kruskal.test() function to perform the Kruskal-Wallis test.
kruskal.test(continuous_variable ~ grouping_variable, 
             data = dataset_name)
  • Applying this to our dental dataset,
kruskal.test(strength ~ system, data = data)

    Kruskal-Wallis rank sum test

data:  strength by system
Kruskal-Wallis chi-squared = 12.515, df = 3, p-value = 0.005812

Example

  • Hypotheses

    • H_0: \ M_1 = M_2 = M_3 = M_4
    • H_1: at least one M_i is different
  • Test Statistic and p-Value

    • \chi_0^2 = 12.515
    • p = 0.006
  • Rejection Region

    • Reject H_0 if p < \alpha; \alpha=0.01.
  • Conclusion/Interpretation

    • Reject H_0. There is sufficient evidence to suggest that there is a difference in strength between the four systems.

Kruskal-Wallis: Posthoc Testing

  • We can also perform posthoc testing in the Kruskal-Wallis setting.

  • The set up is just like Tukey’s – we can perform all pairwise comparisons and control for the Type I error rate.

  • Instead of using |\bar{y}_i - \bar{y}_j|, we will use |\bar{R}_i - \bar{R}_j|, where \bar{R}_i is the average rank of group i.

  • The comparison we are making:

    • We declare M_i \ne M_j if |\bar{R}_i - \bar{R}_j| \ge KW, where KW = \frac{q_{\alpha}(k, \infty)}{\sqrt{2}} \sqrt{\frac{n(n+1)}{12} \left( \frac{1}{n_i} + \frac{1}{n_j} \right)} and q_{\alpha}(k, \infty) is the critical value from the Studentized range distribution.

Kruskal-Wallis: Posthoc Testing

kruskalmc(continuous_variable ~ grouping_variable, 
          data = dataset_name)
  • In our example,
library(pgirmess) 
kruskalmc(strength ~ system, data = data)
Multiple comparison test after Kruskal-Wallis 
alpha: 0.05 
Comparisons
                 obs.dif critical.dif stat.signif
Ceramic-Cimara       0.2     9.871455       FALSE
Ceramic-Cojet        7.9     9.871455       FALSE
Ceramic-Silistor    10.3     9.871455        TRUE
Cimara-Cojet         8.1     9.871455       FALSE
Cimara-Silistor     10.5     9.871455        TRUE
Cojet-Silistor       2.4     9.871455       FALSE
  • Which pairs are significantly different?

Wrap Up

  • Today we have talked about assessing ANOVA assumptions and performing the nonparametric alternative, the Kruskal-Wallis.

  • Per usual, we should only look at posthoc testing when we’ve detected an overall difference with the Kruskal-Wallis.

  • Next lecture: two-way ANOVA.